Skip to content

Conversation

@khusmann
Copy link

This PR adds support for collector-level na args (#532). This way, different lists of missing values can be specified for each column, overriding the global na arg in the call to vroom().

Example:

vroom(
  I("a,b,c\na,foo,REFUSED\nb,REFUSED,MISSING\nOMITTED,bar,OMITTED\n"),
  col_types = cols(
    a = col_character(na = "OMITTED"),
    b = col_character(na = "REFUSED"),
    c = col_character()
  ),
  na = "MISSING"
)
#> # A tibble: 3 × 3
#>   a     b     c      
#>   <chr> <chr> <chr>  
#> 1 a     foo   REFUSED
#> 2 b     NA    NA     
#> 3 NA    bar   OMITTED

Without this PR, it is very difficult to efficiently read columns with different lists of missing values. Instead, they have to be loaded as character vectors, then parsed with readr::parse_*() or readr::type_convert(). There are two problems with this:

I'm hoping you'll consider this PR for inclusion to vroom – it only requires a few changes, is 100% backwards compatible, and adds a feature that cannot otherwise be implemented in a separate package (without duplicating all of vroom's internals). Please let me know if there is anything more I can do to advocate for it. Thank you for your consideration!

@khusmann
Copy link
Author

Note that this is failing the check for windows-latest (3.6) because the runner is grabbing the latest version of evaluate, which now requires R >= 4.0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant